A Note On Estimating the Spectral Norm of A Matrix Efficiently 
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Abstract 

We give an efficient aigorithm which can obtain a relative error approximation to the spectral 
norm of a matrix, combining the power iteration method with some techniques from matrix 
reconstruction which use random sampling. 
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1 Introduction 

For a matrix A 6 R nxd , n > d, we consider estimating its spectral norm ||A|| = max|| x || =1 ||Ax||. 
We give an algorithm to obtain a relative error approximation to ||A|| based on subsampling A 
and then applying the power iteration. The algorithm is randomized, simple, and efficient. Slight 
improvements which give similar asymptotic running times could use a more sophisticated method, 
e.g. a Lanczos method in lieu of the power method, however, we do not pursue that here. It is 
also known that no deterministic algorithm can solve this problem (jKuczvriski and Wozniakowskil . 
19921'). and so one must resort to a randomized algorithm. 



O'Learv et all (119791 ) showed good performance of the power method and lKuczvhski and Wozniakowskil 



(j 19921 ) g ave a detailed anal ysis of the expected and high probability convergence of the power 
method; Woolfe et al. ( 20081 ) considered a randomized test for deter mining if the spectral norm is 
above a value using multiple random starts. We extend the results in lKuczvhski and Wozniakowski 
(| 19921 ) ;o give a more efficient algorithm; we will give a simplified, elementary proof of the prob- 
abilistic convergence of the p ower method, a result asymptotically comparable to the one in 
Kuczyhski and Wozniakowski ( 19921 ): we will combine this with a down-sampling of A that pre- 
serves the spectral norm to obtain a randomized algorithm that realizes the claim in Theorem [TJ 
We quantify the running time in terms of nnz(A) (the number of non-zero elements in A) and a 
parameter r, where 



'nnz(A) / d 
log I — 



d 



and e is the relative error tolerance and 5 is the failure probability. 

Theorem 1. Given A € ]R rix ' i J there is an algorithm which runs in 0(nnz(A) +r) and outputs an 
estimate a 2 which, with probability at least 1 — 5, satisfies 



(l-e)||A|r <<H<(1- C )||A 
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An estimate of the spectral norm can be used to efficiently compute the effective or numerical 
rank p of A, p = y^,- ■ A 2 j/||A|| 2 ; p i s usefu l in developinig ef f icient matrix algorithms, such as matrix 
multiplication Magen and Zouziasl ( 20ld ); Magdon-Ismail ( 2010l ). Notice that the running time is 
significantly faster than the 0(nd 2 ) required to compute the spectral norm exactly via the singular 
value decomposition of A. The algorithm, alon g with its proof of correctness is de scribed in the 
next section. The first term in r is implied by Kuczvhski and Wozniakowski ( 19921 ). so we focus 
on the second term. 



2 Estimating the Spectral Norm 

The algorithm has two basic steps. 

1: Obtain a sketch A of A which has smaller size than A but for which ||A|| ~ ||A||. 
2: Obtain ||A|| using the power iteration method. 

For st ep 1, we use an estimate proven in lMagen and Zouziad l|20ld ). and independently in lMagdon-Ismaill 
( 201ol ). Let A = [ai, . . . ,a n ] T , where a^ are the rows of A. Define probabilities 



Pi 



IAI 



where ||A||^ = Y17=l ll a *l| 2 * s the Frobenius norm of A. Note that all p% can be computed in 
0(nnz(A)) time. Fix integer r > 1; we construct A G W' xd as follows. Let Z be a vector valued 
random variable taking on the n values {ai/y'rpi, . . . , a n /^/rp n }, with probabilities {p\, . . . ,p n }- 
Let Zi, . . . , z r be r independent copies of Z\ the rows of A are the Zj, A = [zi, . . . , z r ] T . Note that, 
given the pi, A can be obtained in additional time 0(n + r logr) time. 

Lemma 2 ( Magdon-Ismaill ( 2O10l )). For e > 0, if r > (Ad / e 2 ) ln(2d / 5) , then w.p. at least 1-5, 



|A T A- A T A|| < ellAII 2 . 



Corollary 3. For e > 0, if r > (4d/e 2 )ln(2d/5), then w.p. at least 1 - 8, 

(l-e)||A|| 2 < ||A|| 2 < (l + e )||A|| 2 . 

Proof. 



|A T A| 
I A T A| 



|A A — A T A + A T A || < (l + e)||A|| 2 ; 
|A T A- A T A + A T A|| < e||A|| 2 + ||A T A| 



We have a sketch of A which preserves the spectral norm; now, to obtain ||A|| , we use the 
power iteration. Let X G W xd be an arbitrary matrix, and xo a unit vector. For n > 1, let 
x n = X T Xx„_i/||X T Xx„_i||. Note that multiplying by X T X can be done in 0(nd) operations. 
Since x n is a unit vector, ||X T Xx n || < ||X|| 2 . Let xo be a random isotropic vector constructed 

using d independent standard Normal variates z\, . . . , Zj; so Xq = [zi, . . . , Zd]/ \J ' z\ + • • • + z 2 d ^. Let 

A 2 = ||X T Xx n || be an estimate for ||X|| 2 after n power iterations. 
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Lemma 4. For e > and a constant c < + 2) 3 , with probability at least 1 — 6, 

xl > Ml'* 1 "') 



" ,/i + p ■ (i - e p<»+i) 



It immediately follows that for some constant c, if n > (c/e) log(d/5e), then A 2 > (1 — e)||X|| 2 . 
Since each power iteration takes 0(rd) time, and we run 0((l/e) log(d/6e)) power iterations, the 
running time is 0((rd/e) log(d/<5e)). Applying this to the estimate A from Lemma [2j with r = 
(Ad/e 2 ) log(2d/5), and we get Theorem[TJ 

Proof. Assume that xo = X^ii a « v «; where Vj are the eigenvectors of X T X with corresponding 
eigenvalues a\ > ••• > cr 2 ^. Note, ||X|| 2 = a 2 . If a 2 , > (1 — e)a 2 , then it trivially follows that 
||X T Xx n || > (1 — e)af for any n, so assume that a 2 , < (1 — e)o~\. We can thus partition the singular 
values into those at least (1 — e)o~ 2 and those which are smaller; the latter set is non-empty. So 
assume for some k < d, aj. > (1 — e)a 2 and o~\ +x < (1 — e)a 2 . Since 



(£ti«M") 1/2 ' 



we therefore have: 



— ||X T Xx n || 



Ed 
i=l 



2 4(n+l) 



>fc 2 4(n+l) 



> 



E/c 2 H 



Efe 9 
i=l «i 



2 a 4(n+l) 



Eti« 2 (^M) 4{n+1) 



Eti « 2 (^/^i) 4n + Et k+ i « 2 (^M) 4 " ' 
5 Eti« 2 (^M) 4(w+1) 

" 1 (1 - e)" 2 Etia 2 (^M) 4(n+1) + (1 " £)~ 2n " 
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(1 - e)~ 2 + (1 - c)-an/ Eti a»(a 4 /ai)*<«+i) ' 
§ ^(l-e) 2 



l + ^-e)- 2 ^ 1 )^' 



(a) follows because for i > k + 1, of < (1 - e)cr 2 ; for i < k, aj/a 2 < (1 - e)~ 2 ; and Ei>fc+i a « 2 — 
Ei>i a i = 1- (b) follows because Ei=i ° ;2 ( cr i/ cr i) 4( ' n+1 ^ > a i- The theorem now follows from the 
next lemma by redefining 5 = {2/n + 2)(6') 1 ^. 

Lemma 5. With probability at least 1 - (2/vr + 2)(5') 1 / 3 , a\ > S'/d. 

To conclude the proof, we prove Lemma [5j It is clear that IE [a 2 ] = 1/d from isotropy. With- 
out loss of generality, assume vi is aligned with the z\ axis. So a 2 = z\j^2, i z 2 (zi,- ■ ■ , Zd are 



3 



independent standard normals). For 5' < 1, we estimate P[a| > 5' /d] as follows: 



af > 



> 

(a) 

(6) 
> 



> 



z 2 > 



i>l 

5' 



d-6 



i>2 



6' 



l ^d 



i>2 



2 2 



'x? > # + (s') 2/3 



d-l 



Xl-x <S' + (8') 2/3 



In (a) we compute the probability that a x\ random variable exceeds a multiple of an independent 
Xd-i random variable, which follows from the definition of the x 2 distribution as a sum of squares 
of independent standard normals, (b) follows from independence and because one particular re- 
alization of the event in (a) is when xi > 8' + (<5') 2/3 and 8'x 2 d _J{d - 1) < 8' + (<5') 2/3 - Since 
E[Xd_i/(rf - 1)] = 1, and Var[x 2 d _J{d - 1)] = 2/(d - 1), by Chebyshev's inequality 



5' 



Xl-i <8' + (S'f 3 



> 1 



d-l 



d-l 

From the definition of the xf distribution, we can bound F[xl < 5' + (5') 2/3 ] 

1 r S' + (6'f/z 



l<6' + (<5') 2/3 ] 



2V2r(l/2) J 



du «- 1 /2 e - u / 2 <J^(a / + (<5 , ) 3/3 ) 1/3 , 



and so 



2 8' 
a ^~d 



> 1 



1(^ + ^)2/3)1/2 
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